118 research outputs found

    Geometry of Policy Improvement

    Full text link
    We investigate the geometry of optimal memoryless time independent decision making in relation to the amount of information that the acting agent has about the state of the system. We show that the expected long term reward, discounted or per time step, is maximized by policies that randomize among at most kk actions whenever at most kk world states are consistent with the agent's observation. Moreover, we show that the expected reward per time step can be studied in terms of the expected discounted reward. Our main tool is a geometric version of the policy improvement lemma, which identifies a polyhedral cone of policy changes in which the state value function increases for all states.Comment: 8 page

    Two semi-Lagrangian fast methods for Hamilton-Jacobi-Bellman equations

    Full text link
    In this paper we apply the Fast Iterative Method (FIM) for solving general Hamilton-Jacobi-Bellman (HJB) equations and we compare the results with an accelerated version of the Fast Sweeping Method (FSM). We find that FIM can be indeed used to solve HJB equations with no relevant modifications with respect to the original algorithm proposed for the eikonal equation, and that it overcomes FSM in many cases. Observing the evolution of the active list of nodes for FIM, we recover another numerical validation of the arguments recently discussed in [Cacace et al., SISC 36 (2014), A570-A587] about the impossibility of creating local single-pass methods for HJB equations

    Evolutionary game of coalition building under external pressure

    Get PDF
    We study the fragmentation-coagulation (or merging and splitting) evolutionary control model as introduced recently by one of the authors, where NN small players can form coalitions to resist to the pressure exerted by the principal. It is a Markov chain in continuous time and the players have a common reward to optimize. We study the behavior as NN grows and show that the problem converges to a (one player) deterministic optimization problem in continuous time, in the infinite dimensional state space

    Pseudorehearsal in value function approximation

    Full text link
    Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

    Exploring Graphs with Time Constraints by Unreliable Collections of Mobile Robots

    Get PDF
    A graph environment must be explored by a collection of mobile robots. Some of the robots, a priori unknown, may turn out to be unreliable. The graph is weighted and each node is assigned a deadline. The exploration is successful if each node of the graph is visited before its deadline by a reliable robot. The edge weight corresponds to the time needed by a robot to traverse the edge. Given the number of robots which may crash, is it possible to design an algorithm, which will always guarantee the exploration, independently of the choice of the subset of unreliable robots by the adversary? We find the optimal time, during which the graph may be explored. Our approach permits to find the maximal number of robots, which may turn out to be unreliable, and the graph is still guaranteed to be explored. We concentrate on line graphs and rings, for which we give positive results. We start with the case of the collections involving only reliable robots. We give algorithms finding optimal times needed for exploration when the robots are assigned to fixed initial positions as well as when such starting positions may be determined by the algorithm. We extend our consideration to the case when some number of robots may be unreliable. Our most surprising result is that solving the line exploration problem with robots at given positions, which may involve crash-faulty ones, is NP-hard. The same problem has polynomial solutions for a ring and for the case when the initial robots' positions on the line are arbitrary. The exploration problem is shown to be NP-hard for star graphs, even when the team consists of only two reliable robots

    Social learning against data falsification in sensor networks

    Get PDF
    Sensor networks generate large amounts of geographically-distributed data. The conventional approach to exploit this data is to first gather it in a special node that then performs processing and inference. However, what happens if this node is destroyed, or even worst, if it is hijacked? To explore this problem, in this work we consider a smart attacker who can take control of critical nodes within the network and use them to inject false information. In order to face this critical security thread, we propose a novel scheme that enables data aggregation and decision-making over networks based on social learning, where the sensor nodes act resembling how agents make decisions in social networks. Our results suggest that social learning enables high network resilience, even when a significant portion of the nodes have been compromised by the attacker

    Robust Markov Decision Processes

    Full text link

    A Semi-Lagrangian scheme for a modified version of the Hughes model for pedestrian flow

    Get PDF
    In this paper we present a Semi-Lagrangian scheme for a regularized version of the Hughes model for pedestrian flow. Hughes originally proposed a coupled nonlinear PDE system describing the evolution of a large pedestrian group trying to exit a domain as fast as possible. The original model corresponds to a system of a conservation law for the pedestrian density and an Eikonal equation to determine the weighted distance to the exit. We consider this model in presence of small diffusion and discuss the numerical analysis of the proposed Semi-Lagrangian scheme. Furthermore we illustrate the effect of small diffusion on the exit time with various numerical experiments

    Building collaboration in multi-agent systems using reinforcement learning

    Get PDF
    © Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm
    • …
    corecore